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Abstract 

Background: Extensive and automated data integration in bioinformatics facilitates the construction of large, 
complex biological networks. However, the challenge lies in the interpretation of these networks. While most 
research focuses on the unipartite or bipartite case, we address the more general but common situation of 
/c-partite graphs. These graphs contain k different node types and links are only allowed between nodes of 
different types. In order to reveal their structural organization and describe the contained information in a more 
coarse-grained fashion, we ask how to detect clusters within each node type. 

Results: Since entities in biological networks regularly have more than one function and hence participate in more 
than one cluster, we developed a /c-partite graph partitioning algorithm that allows for overlapping (fuzzy) clusters. 
It determines for each node a degree of membership to each cluster. Moreover, the algorithm estimates a 
weighted /c-partite graph that connects the extracted clusters. Our method is fast and efficient, mimicking the 
multiplicative update rules commonly employed in algorithms for non-negative matrix factorization. It facilitates 
the decomposition of networks on a chosen scale and therefore allows for analysis and interpretation of structures 
on various resolution levels. Applying our algorithm to a tripartite disease-gene-protein complex network, we were 
able to structure this graph on a large scale into clusters that are functionally correlated and biologically 
meaningful. Locally, smaller clusters enabled reclassification or annotation of the clusters' elements. We exemplified 
this for the transcription factor MECP2. 

Conclusions: In order to cope with the overwhelming amount of information available from biomedical literature, 
we need to tackle the challenge of finding structures in large networks with nodes of multiple types. To this end, 
we presented a novel fuzzy /c-partite graph partitioning algorithm that allows the decomposition of these objects 
in a comprehensive fashion. We validated our approach both on artificial and real-world data. It is readily 
applicable to any further problem. 



Background 

With the increasing availability of high throughput 
"-omics" technologies such as next generation sequen- 
cing, proteomics or metabolic profiling an enormous 
amount of textual data is accessible in the biomedical 
literature. Hence, methods able to structure these het- 
erogeneous data and to extract new knowledge gain 
more and more importance. Learning approaches often 
focus on the analysis of homogeneous data sets that can 
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be represented as graphs having vertices of a single type 
only. However, biological networks are complex and 
highly diverse and therefore often involve objects of 
multiple types, forming /c-partite graphs consisting of 
different kinds of vertices. We use this representation as 
it provides a more comprehensive picture of the under- 
lying structure compared to the widely used graph 
transformations. These so-called projections - e.g. of a 
bipartite network into an unipartite version - discard 
important information [1]. For instance, [2] shows that 
in the case of metabolism the use of projections leads to 
wrong interpretations of some of the most relevant 
graph attributes, whereas the bipartite view offers a clea- 
ner interpretation of its topological features. 
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The human disease network presented in [3] is an 
example for a bipartite graph having two disjoint sets of 
vertices. Here, structural questions need to be addressed 
outside of the unipartite graph setting. One set of nodes 
represents all known genetic disorders, the vertices of 
the other partition correspond to all known disease 
genes in the human genome. A disorder and a gene are 
connected if mutations in that gene are implicated in 
that disorder. Other examples of bipartite networks are 
protein complex or gene-localization, gene-function or 
microRNA-target networks. The integration of such net- 
work data then leads to complex /c-partite graphs. 

A key question is how to interpret the internal organi- 
zation of these networks. A possible answer may be a 
modular decomposition, which implies the coexistence 
of structural subunits associated with more highly inter- 
connected parts. We regard the identification of these a 
priori unknown building blocks - such as for instance 
functional modules in protein-protein-interaction (PPI) 
networks - as clustering methods. The clusters and their 
interconnections are essential for understanding the 
underlying functional properties. They structure biologi- 
cal data by compressing their information into a con- 
densed form. 

Most available clustering methods do not treat the 
single node types (partitions) separately and therefore 
do not represent the global cluster structure of /c-partite 
networks correctly. While this has been addressed in 
terms of algorithms for some time now [4-6], not many 
applications were successfully implemented in bioinfor- 
matics yet, although the field commonly deals with such 
networks [1]. A particular issue that may hamper appli- 
cation to biological data is that most existing algorithms 
identify separated, disjoint clusters by assigning each 
point to exactly one cluster [7,8]. This is unrealistic for 
biological systems as e.g. genes or proteins commonly 
participate in multiple processes or pathways [9]. So far, 
only a few approaches exist that allow the detection of 
overlapping clusters. These either assign each data point 
to several hard clusters [10] or determine a degree of 
membership to each cluster [11,12]. Such methods are 
known as fuzzy clustering, but have not been applied to 
the common biological case of /c-partite graphs. 

To overcome these difficulties we developed a novel 
fuzzy clustering algorithm based on a non-negative 
matrix factorization (NMF) model [13]. Our algorithm 
extends a hard clustering algorithm recently put forward 
in [14]. This algorithm clusters each node type of the 
graph separately and then connects clusters via a smal- 
ler, weighted /c-partite graph in an alternating minimiza- 
tion procedure. Thereby, the cluster assignment in the 
first step is made in a binary fashion. This disjoint clus- 
tering is a feature that is often achieved by soft cluster- 
ing algorithms when not forcing explicit cluster overlap 



[11]. However, it can be easily seen that the cost func- 
tion proposed is not fully minimized. Our computation- 
ally efficient and scalable algorithm avoids this problem. 
It is similar in structure to multiplicative algorithms for 
NMF, with the difference that we address a three-matrix 
factorization problem (see e.g. [15]), and have to deal 
with a multi-summand cost function. As our cost func- 
tion is monotonous with respect to the number of clus- 
ters, our algorithm allows detection of clusters on 
different scales. Hence, we are able to decompose the 
network on different resolution levels. 

The manuscript is organized as follows. In the first 
part, we develop the fuzzy clustering algorithm and 
validate it on a toy example and graphs with known 
modular structure. Then, we apply it to a tripartite dis- 
ease-gene-protein complex graph representing an 
expanded view of the human disease network from [3] 
extended by protein complexes [16]. By integrating 
functional annotation we demonstrate that we are able 
to structure this complex graph into biologically mean- 
ingful clusters on a large scale. Finally, focusing on the 
small-scale architecture, we identify overlapping clusters 
that give a more comprehensive picture about gene- 
disease connections rather than looking at disjoint clus- 
ters alone. We exemplify how this clustering allows for 
reclassification, annotation or even detection of misclas- 
sified elements on a local level. 

Results and Discussion 

A /c-partite graph is a graph G = (V, E) of edges E 
between a set of vertices V together with a partition of 
the vertices into k disjoint subsets V t such that no two 
vertices in the same subset are adjacent. For k = 1 this 
reduces to the standard graph, where we do not take 
into account different node types. Graphs with two par- 
titions are called bipartite. Let n t := | V/| be the number 
of vertices in the partition i, i - 1 ... k. We represent the 
graph as a set of n t x n ; -dimensional adjacency matrices 
for all U j with 1 < i <j < k. Typically, each matrix 
element is either 0 or 1, but we only restrict the 
matrices to have non-negative coefficients thereby allow- 
ing weighted graphs as well. 

Approach 

We want to approximate G by a smaller /c-partite cluster 
network H which we call backbone network It is defined 
on the fuzzy clusters of each G-partition V i% We fix the 
number of clusters in partition i to We denote a 
non-negative n t x m r dimensional matrix C w to be a 
fuzzy clustering of V it if it is right-stochastic, i.e. 

5^0$ =1 for all k. Its (k, /)-th element C$ gives 

the degree of membership of the original node k to the 
backbone node /. 
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Figure 1 Illustration of the fuzzy clustering approach. We want to approximate the tripartite example graph G in (a) by a smaller tripartite 
cluster network H, the so-called backbone graph (b). The decomposition into fuzzy clusters connected by this backbone must explain the 
original connectivity as good as possible. The edges of G are collected in adjacency matrices A ( ^ connecting the elements of the partitions /' and 
j. The approximation of G by the backbone graph is encoded in the adjacency matrices B (v) connecting the fuzzy node clusters C (/) . These 
matrices C (/) collect the degrees of membership of each node of partition V-, to each cluster of this type. Its (k, /)-th element specifies how 
much node k belongs to the backbone node /. 



Then we search for a /c-partite graph H with x m y - 
adjacency matrices and a fuzzy clustering C := (C w ) 
i = i,...,k such that the connectivity explained by H is as 
close as possible to G after clustering according to 
C. Figure 1 shows an example graph and its approxima- 
tion by a backbone network. From the approximation, 

we can easily reconstruct an edge A$ between two 

nodes u and v from partitions i and / in the original 
graph. To this end, we have to sum up all edge weights 
B (/;) in the backbone graph that connect the commu- 
nities u and v are assigned to. Of course, in a fuzzy 
environment these contributions have to be weighted by 
the nodes' degrees of membership C w and C^, respec- 
tively. Taken together, the entry of the adjacency matrix 
can be reconstructed as the double sum 



^uv ux x y v y ' 

x=l y=l 



Writing this in matrix notation, we see that the 
requirement of explaining maximum possible connectiv- 
ity means that the adjacency matrices A (//) are best pos- 
sible approximated by factorizations of the form 



We can measure the difference between the two 
graphs H and G in a variety of ways. In [14], this choice 
has been circumvented by focusing on arbitrary Breg- 
man divergences, see e.g. [17], which still allow efficient 
reformulation of gradient-type algorithms without 
knowing the specific formula. This is also possible in 
our case of multiplicative update rules, as has been 
shown for NMF by [15]. Here, we choose the minimum 
square distance as the cost function. This implies mini- 
mization of 



f[H,C) := ^||A (W -C W B ( «(C W ) T 



i<] 



where 



denotes the squared Frobenius norm i.e. 



A (ij) «C (i) B (ij " ) (C°" ) ) 1 



the square sum of the matrix elements. This cost func- 
tion is obviously monotonous with respect to the num- 
ber of clusters in each partition. 
Algorithm formulation and relation to other work 
We aim at minimizing the cost function C) using a 
local algorithm extending gradient descent. In order to 
avoid the choice of update rates and to ensure positivity 
of both the backbone network and the degrees of mem- 
bership of all nodes, we employ multiplicative update 
rules. This strategy is widely used in algorithms for non- 
negative matrix factorization (NMF) [18]. We find the 
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following update rules (see Methods for the detailed 
derivation): 



B 0/') <_ B (y) . 



p(0 >_ p(0 
^ r5 ^— ^ r5 



(( C W)T A «0 C 0)) 
( (C«) T C«B^(C^) T C^) 

V A (,}) C (,) (B (,j ') 



^^ c (0 B (y)(cW) T cW(B^) T 



We note that these update rules do not increase the 
cost function (1). This can be shown via auxiliary func- 
tions similar to NMF [18] and multi-factor NMF [15], 
which has been applied in a related model for co-clus- 
tering of microarray data [19]. This theoretical result 
implies convergences of the update rules. However, in 
contrast to early statements in NMF [18], it does not 
necessarily imply convergence to stationary points of the 
Euclidean norm (zero of the differential from (1)), since 
the update steps may be too small to reach those points. 
Another possible drawback of such multiplicative 
updates is the fact that once a matrix entry has been set 
to zero (which may happen due to zeros in A (//) or to 
numerics), the coefficient will never then be able to 
become positive again during learning. This is one of 
the reasons, why sometimes alternating least-squares 
algorithms are chosen [20]. 

We have not yet taken into account the constraint 
that the fuzzy clusterings C w are required to be right- 
stochastic. We force this constraint by regularly project- 
ing each row of C w onto the sphere of the 1-norm. The 
final fuzzy /c-partite clustering algorithm is summarized 
in Figure 2. Implementations are available as Additional 
Files 1 and 2. 

Our algorithm has two nested loops over the number 
of partitions /<, hence its runtime depends quadratically 
on this number. The update rules for C w and how- 
ever are fully vectorized and contain only matrix opera- 
tions with non-square matrices. Their time complexity 
is dominated by the occurring matrix products: multi- 
plying two matrices of sizes s 1 x s 2 and s 2 x s 3 is of 
complexity Q (5^253). Assuming that the cluster num- 
bers wii are smaller than the largest two partition sizes, 
the total time complexity of the fuzzy clustering algo- 
rithm can then be estimated as 

# iterations x 0(fe 2 n maxl n max2 m max ). 

Here, H max i and n max2 denote the sizes of the largest 
and the second-largest partition, m max is the maximum 
number of clusters within any partition. Hence, the run- 
time grows only quadratically in the total number of 



nodes in the case of graphs with similarly large parti- 
tions. In general, the runtime is linear in each partition's 
size Hi and cluster number Wj. We additionally con- 
firmed this theoretical analysis by simulations shown in 
Additional File 3. 
Algorithm evaluation - Toy example 

For illustration, we applied our algorithm to a bipartite 
graph having several vertices connected with all vertices 
of the other partition (e.g. nodes 1 and 10). Figure 3 
shows that these vertices are assigned to two clusters 
with distinct degree of membership, whereas vertices 
partially connected are element of a single cluster only 
(e.g. 3). This demonstrates the idea and importance of 
using a fuzzy that allows for overlapping clusters. 
Algorithm evaluation - Performance analysis 
Before applying our algorithm to real-world data, we 
tested its behavior on simulated data with controlled 
cluster structure. In particular, we compared it to the 
hard clustering algorithm from [14]. We used exactly the 
same stopping criteria for both algorithms. We generated 
random, modularly structured /c-partite networks as 
described in the Methods section. In order to compare 
algorithm performance, we determined runtime, final 
cost function value and the quality of cluster estimation 
in four different settings. We restricted ourselves to 
bipartite and layered tripartite graphs with two different 
noise settings because Long et al. provided code for ana- 
lyzing these special cases only. 

We found that while the method of Long et al. per- 
formed around two times faster, our algorithm produced 
around 10% lower cost function and was able to esti- 
mate the cluster structure better (see Figure 4). This dif- 
ference in algorithm runtime originates from the much 
more fine-tuning of the continuous degrees of member- 
ship compared to hard cluster assignments. These 
require less update steps until convergence. 
Algorithm evaluation - Stability of clusters 
In contrast to deterministic methods like for instance 
singular value decomposition (SVD), NMF-based meth- 
ods have problems concerning robust computation. 
Even for standard unipartite NMF there is no unique 
global minimum of the cost function [21]. Our algo- 
rithm aims to minimize the cost function using a local 
optimization strategy extending gradient descent. This 
implies that the algorithm only converges to a local 
minimum. The algorithm is indeterministic, it does not 
converge to the same solution on each run due to the 
stochastic nature of initial conditions. Thus, following 
the general proceeding in literature on NMF [21,22], we 
compare the local minima from several different starting 
points (multiple restarts), using the results of the best 
local minimum found. 

In order to illustrate the stability of the fuzzy cluster- 
ing algorithm we applied it to a toy network with well 
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Algorithm 1 fuzzy /c-partite clustering 



Input: /c-partite graph G with possibly non-negatively weighted edge matrices i < j, num- 

ber of clusters mi , . . . , rrik 
Output: fuzzy clustering and /c-partite cluster graph H given by matrices 



l Initialize C^B^ to random non-negative matrices. 



JO 



/(E 



t °rt 



for all i, r, s 



2 Normalize Cr 
repeat 

update fuzzy clusters 
for z «— 1, . . . , fc do 

Normalize ^— dr) /(%2 t 0$) for all r, s 
end 

update k -partite cluster graph H 
for i <— 1, . . . , fc — ldo 
for j «— i + 1 , . . . , /c do 

Bfo') <- B^') (8) (CW T AW'CW) 0 (CW T CWBW)CW T CW)) 
end 
end 

until convergence] 

Note: 0 and 0 symbolize element-wise multiplication and division, respectively. 
Figure 2 Fuzzy clustering algorithm. Summarization of the final fuzzy /c-partite clustering algorithm. 



defined cluster structure using multiple restarts. We clustering results in more than 70% of the runs. Hence, 

compared the clustering results of 100 runs and quanti- we can not guarantee that the local optimization finds a 

fied the cluster stability using a fuzzy variant of the rand global minimum of the cost function, and with this the 

index recently proposed in [23]. As we show in Addi- cluster structure of a graph. This illustrates the critical 

tional File 4, our algorithm is able to reproduce the true need for multiple restarts. 




Figure 3 Illustration of the cluster decomposition of a bipartite toy example (a) We demonstrate the graph decomposition with our 
algorithm on a small bipartite graph with overlapping cluster structure. The original graph consists of partitions V y = {1 ... 4} (red filled nodes) 
and V 2 = {5 ... 10} (blue filled nodes) connected by edges A (12) colored in black. We decomposed it into two clusters for partition V } and three 
clusters for partition V 2 . The resulting fuzzy clustering is illustrated as a weighted graph connecting original nodes to cluster nodes (framed red 
and blue). The cluster assignments C (1) and C (2) are indicated by dashed lines, where the coloring corresponds to the degree of cluster 
membership. The interconnections of the clusters form the backbone graph, encoded in the adjacency matrix B (12) which we denote by 
continous lines where color indicates the edge weight. Another way of illustrating the graph decomposition is shown in (b). It is clearer 
especially for larger graphs. First, we plot hierarchical clusterings of the nodes' degrees of membership in partitions V } and V 2 (encoded by C (1) 
and C (2) ). This facilitates the identification of overlapping clusters (e.g. nodes 1 and 10 are assigned to more than one cluster) or hard cluster 

assignments (e.g. node 5). The backbone graph B (12) is shown bottom right. This backbone graph is densely connected in our example. 

v J 
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Figure 4 Performance on toy models. We validated our algorithm on graphs with predefined cluster structure. To this end, we compared it 
with the hard clustering method by Long et al. on four different random toy models, see Table 1. The plot shows the mean relative deviation 
between the two algorithms relative to the results of the hard clustering. Error bars denote standard deviations over 1000 runs. We see that the 
fuzzy cluster assignments of our method require much more runtime, but both cost function and data estimation error (see Methods) are 
significantly smaller. The large standard deviations show the dependency of the decomposition on the random initial conditions. Therefore, by 
default we perform multiple restarts with different initializations. 



Structuring biological data 

In order to exemplify the analysis of biological networks, 
we applied our algorithm to a layered tripartite disease- 
gene-protein complex network, see Figure 5a for an 
illustration. In this graph, a disorder and a gene are con- 
nected if mutations in that gene are implicated in that 
disorder. A complex and a gene are linked if the gene is 
coding for a protein part of the complex. We con- 
structed this graph by integrating the human gene-dis- 
ease network from [3] and protein complexes from the 
CORUM database, as explained in the Methods section. 

An important feature of many biological networks is 
their hierarchical organization, where higher-level struc- 
ture is composed of multiple instances of a lower-level 
structures of different types [24] . This implies that small 
groups of nodes organize in a hierarchical manner to 
increasingly large groups on many different scales [25,26] . 
To account for this topological characteristic we have to 
be able to extract relevant information on an appropriate, 
pre-defined resolution level. 

We addressed this issue by analyzing the very global 
structure and a detailed local level of the disease-gene- 
protein complex network. In the following, we first pre- 
sent the results of a decomposition into large clusters 
which demonstrates that our method is generally applic- 
able to biological data. Then, we discuss smaller clusters 
that allowed for a precise interpretation of single 
elements. 



As discussed before, due to its random initialization 
our algorithm is inherently indeterministic. Different 
clustering results have of course a significant impact on 
the interpretation of the biological meaning of the 
results. We show in Additional File 4 that our algorithm 
is quite stable on graphs with well defined cluster struc- 
ture. To avoid analyzing a local minimum, we discuss 
performance over 10 runs and verify that the disease- 
gene-protein complex network has indeed a defined 
cluster structure in Additional File 5. 

Dealing with a theoretically monotonous cost func- 
tion, it is hard to determine the optimal numbers of 
clusters for each node type in which the graph has to be 
partitioned. Appropriate values are not apparent from 
prior knowledge about our data set. We therefore chose 
desired approximate resolutions m g for the gene parti- 
tion. The number of clusters m c and m d in the protein 
complex and disease partitions were then scaled accord- 
ing to their partitions' sizes (see Methods). We use this 
heuristics, since a brute-force sampling of the three- 
dimensional parameter space is computationally out of 
reach. Then, we looked for plateaus and steep drops in 
the cost function within a certain range around this 
value m g and chose a local optimum of the algorithmi- 
cally found decompositions. In Additional File 5 we per- 
formed simulations showing that the profile of the cost 
function may indeed indicate for a proper number of 
clusters in graphs with known cluster structure. 
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Figure 5 Decomposition of a gene-disease-protein complex network. We integrated the gene-disease network from [3] with human protein 
complexes from the CORUM database [16]. This resulted in a layered tripartite graph, which is schematically drawn in (a). We performed a 10- 
fold approximation of this graph to estimate appropriate numbers of clusters. The boxplot curve (b) shows how the cost function f{H, Q from 
equation (1) depends on the number of gene clusters m g . The true minima of the cost function are decreasing with m gi and this is also visible 
in the approximated minima using our proposed algorithm. Therefore, we are able to identify structures on various resolution levels. The details 
represent the cost function course for large-scale clustering (i) and a decomposition on small scale (ii), respectively. For our detailed analyses, we 
used the decompositions showing steep drops in the cost function marked by the red and green boxes. 



Large-scale clustering 

First, we focused on the identification of large clusters. 
Figure 5b shows the distribution of the cost function 
values after algorithm convergence for each parameter 
setting. In the following discussion, we used (m g) m c) 
m d ) = (10, 5, 6) as it showed the first steep drop in the 
cost function. Moreover, here we observed a significant 
local minimum of the cost function values of the algor- 
ithmically determined decompositions. From the illus- 
tration of the decomposition in Figure 6 we see that the 
resulting clusters vary strongly in size. For all partitions, 
the majority of elements was assigned to a single cluster 
with degree of membership ^ > = 0.9. This demon- 
strates that the analyzed graph has a well defined cluster 
structure at the desired resolution level. The corre- 
sponding histograms are given in Additional File 5. 
Therein we also discuss an example illustrating that 
such large degrees of membership are rarely found in 
graphs lacking any cluster structure. However, there 
exists also a considerable amount of elements assigned 
to several clusters simultaneously, e.g. in complex clus- 
ters 3 and 5, gene clusters 1 and 3 or disease clusters 3 
and 4. This confirms the usefulness of our fuzzy 
approach. 



Cluster evaluation To determine whether the resulting 
clusters are biologically reasonable, we applied GO 
enrichment analysis (see Methods) to the clusters of the 
gene partition. We found, for instance, that for the genes 
in the two overlapping clusters 1 and 3 significantly 
enriched GO-terms are cell cycle and cellular response to 
stimulus/stress. Genes in cluster 4 can be related to e.g. 
death, cell proliferation and developmental processes, 
whereas cluster 6 represents translation. Gene expres- 
s/ow-associated GO-terms such as RNA processing and 
splicing were detected in cluster 7. This shows that our 
method was able to identify biologically meaningful func- 
tionally enriched clusters. The complete tables for GO 
enrichments in all clusters are shown in Additional File 6. 

The interconnectivity of the in total 21 clusters is 
sparse (see Figure 6). The skeleton for the global cluster 
structure for both underlying bipartite graphs (gene- 
complex and gene-disease) demonstrates that locally 
overlapping clusters also tend to interconnect with the 
same clusters of the other partition; for instance, disease 
clusters 3 and 4 are both connected with gene cluster 9. 
To evaluate the extracted backbone graph, in the follow- 
ing we tested the hypothesis that interconnected clusters 
of different partitions are also functionally correlated. 
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Figure 6 Illustration of large-scale cluster structures in the gene-disease-protein complex network The large scale decomposition of the 
gene-disease-protein complex network is illustrated as described in Figure 2b. The hierarchical clustering of the nodes' degrees of membership 
of the (a) complex, (c) gene and the (d) disease partition show that the majority of elements was assigned to single clusters. However, a 
considerable amount of cluster overlaps exists, e.g. for the disease clusters 3 and 4. The backbones for gene-complex (b) and for gene-disease 
(e) are sparsely connected, but show that locally overlapping clusters tend to interconnect with the same clusters of the other partition; e.g. 
disease cluster 3 and 4 are both connected to gene cluster 9 with large weights. 

V ) 
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Gene-complex interconnections Assuming that the 
resulting interconnected gene and complex clusters are 
functionally related, one expects to see a similar profile 
for FunCat annotation and backbone interconnectivity 
of each cluster. This hypothesis was verified in Figure 7, 
where for instance complex cluster 3 and the intercon- 
nected gene clusters 1 and 3 show a high binary FunCat 
correlation. The difference score (as defined in Meth- 
ods) between backbone interconnectivity and annotation 
correlation is 2.48, resulting in a j^-value < 10" 5 . To 
compare the results of the fuzzy clustering approach 
with the results for the disjoint clustering method from 
[14] we applied the algorithm with the same parameter 
settings and identical annotation and randomization 
procedure to the obtained clusters. For hard clustering 
we achieved a larger difference score of 2.99 which cor- 
responds to a significant Rvalue of 0.0015. 



Gene-disease interconnections To ascertain that our 
method is able to detect biological feasible clusters in all 
partitions, we determined for each gene and disease 
cluster disorder class profiles. Again, we observed a high 
similarity between backbone interconnectivity and disor- 
der correlation having a difference score of 1.09 (/rvalue 
< 10" 5 ). For instance, gene cluster 1 and 10 and the 
interconnected disease clusters 1 and 5 show a high dis- 
order correlation (see Figure 7). 
Small-scale clustering 

We showed that our method is able to both detect and 
interconnect biologically meaningful clusters. However, 
due to their size of about 279 genes on average the sin- 
gle clusters are hard-to-interpret. The detection of smal- 
ler clusters representing biological units enables a 
precise biological interpretation. In the following, we 
describe results for (m g) m c , m d ) = (222, 135, 112), 
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Figure 7 Evaluation of the backbone of the gene-disease-protein complex network. To evaluate the large-scale clustering we additionally 
included functional annotations, (a) and (b) compare the gene-complex backbone graph with the functional correlations of the extracted 
clusters according to FunCat annotation. Similarly, (d) and (e) show the gene-disease backbone and the clusters' disorder class correlations (see 
Methods). We see that interconnected clusters also seem to correlate in their annotations. To test this hypothesis rigorously, we calculated 
difference scores as defined in Methods in order to quantify the correlation of the backbones and their annotations, respectively. Vertical lines in 
(c) and (f) correspond to these difference scores for the fuzzy (black) and the hard (red) clustering. Comparing these values to the difference 
scores for 10 5 randomized cluster assignments we obtain significant p-values, both < 10" 5 . The correlations between annotations of connected 
clusters of the backbone is higher when applying the fuzzy approach. 
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where we found the lowest value of the cost function 
(see Figure 5b). This setting accounts for an average 
cluster size of 10 genes. 

In order to make use of the cluster overlaps, we 
looked for genes assigned to more than one cluster with 
a degree of membership of ft > 0.2. We considered this 
threshold as significant as it is 50-fold higher than 
assigning each gene uniformly to all 222 gene clusters 
with equal degree of membership ft = 0.0045. 

As a showcase we chose MECP2, a protein that func- 
tions as a key factor in epigenetic transcriptional regula- 
tion. It is known to be involved in neurodevelopmental 
and psychiatric disorders such as Autism, Mental retar- 
dation and Angelman syndrome [3,27,28], and was 
assigned to three distinct gene clusters: 25 (ft - 0.42), 32 
(ft - 0.31), 200 (ft - 0.24). These clusters mainly cover 
neurological (23%), psychiatric (81%) and pleiotropic 
(7%) genes having a degree of membership > 0.2. This 
is illustrated in Figure 8, where we visualized the back- 
bone interconnectivity and the fuzzy clustering of the 
nodes in the neighborhood of MECP2. 

We then analyzed the nine disease clusters intercon- 
nected with the three gene clusters in the backbone net- 
work. In total, 45 disorders representing mainly 
psychiatric (66%) and neurological (20%) disorders were 
assigned to eight disease clusters with a degree of mem- 
bership of {i > 0.2; 6 out of 9 psychiatric disorders avail- 
able in the network analyzed are present in three 
disease clusters. 

Another large fraction of these disease clusters are dis- 
orders classified as multiple. Most of them (Shprintzen- 
Goldberg syndrome or Aarskog Scott syndrome) show 
also neurological diseases such as mental retardation 
[29,30]. We also identified the ophthamological disorder 
Blepharospasm, an adult-onset focal dystonia that causes 
involuntary blinking and eyelid spasms [31] for that a 
known polymorphism in the dopamine receptor DRDS 
is associated with [32]. This is a subform of Dystonia 
and classified as a neurological disorder (ICD-10 G24.5) 
by the WHO [33]. 

Furthermore, we found Anorexia nervosa to be present 
in the analyzed clusters. It is annotated as a nutritional 
disorder by [3], however it represents a life-threatening 
complex psychiatric disorder [34]. Another so far 
unclassified disease Alcohol dependence was assigned to 
the interconnected cluster. It is classified as a mental 
and behavioral disorder (ICD-10 F10.2) and in a broader 
sense can be considered as psychiatric disorder. 

In contrast, applying the hard clustering algorithm, 
MECP2 was assigned to a single gene cluster which is 
connected to two disease clusters. Although all asso- 
ciated disorders were identified correctly, no further 
information could be obtained from the clusters. 



However, [27] reports an epigenetic overlap in autism- 
spectrum neurodevelopmental disorders as MECP2 
affects the regulation of UBE3A expression. These rela- 
tions became immediately apparent in the cluster result 
of our fuzzy approach: Both genes were mutually 
assigned to gene cluster 25 that identifies the phenotypic 
and genotypic overlaps, whereas direct links to known 
connected genes are missing in the hard clustering (see 
Figure 8). 

Conclusions 

The widespread application of high-throughput methods 
such as microarrays or next generation sequencing has 
considerably increased the amount of experimental data 
and the information available in biomedical literature 
that is accessible to text-mining approaches [35]. These 
data can usually be represented in terms of networks. 
Over the last years, networks have emerged as an 
invaluable tool for describing and analyzing complex 
systems. However, we need to take into account that 
network information is commonly available for various 
types of nodes. Especially integrative biological networks 
are /c-partite [3,36]. 

Another important feature of biological networks is their 
hierarchical organization, implying that small groups of 
nodes organize in a hierarchical manner to increasingly 
larger groups on many different scales [24-26]. This neces- 
sitates the analysis of these objects on various resolution 
levels. Furthermore, many proteins or genes are pleiotro- 
pic, and often associated with many functions. Hence, 
clustering algorithms that assign elements into several 
functional modules are essential [10,12,37]. 

We presented a novel computationally efficient and 
scalable graph clustering algorithm that is capable to 
deal with all these described issues. Further, it does not 
require any a priori knowledge about the data set. 
Results on a tripartite network, constructed by integrat- 
ing the human disease network and protein complexes, 
demonstrated that we could identify and interconnect 
biologically meaningful clusters on different scales. 
Overlapping modules gave a more comprehensive pic- 
ture of e.g. gene-disease connections than looking at dis- 
joint clusters alone. Summarizing, the proposed fuzzy 
clustering algorithm is suitable to compress and approx- 
imate the underlying topology of heterogeneous biologi- 
cal networks, which facilitates the understanding of such 
networks on multiple scales. It is freely available and 
readily applicable to many further problems. 

Methods 

Derivation of the update rules 

We want to minimize f(H, C) in equation (1) using a 
local algorithm extending gradient descent. Let D (//) : = 
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Figure 8 The small-scale clustering in the neighborhood of MECP2. We draw the results - the backbone network and the nodes' degrees of 
membership to clusters, thresholded by \d > 0.2 - of the small-scale clustering in the neighborhood of MECP2 using the fuzzy (a) and the hard 
clustering (b). Nodes are colored according to their disorder class annotations. Blue edges indicate backbone interconnectivity, grey edges 
cluster assignment. Edge thickness indicates the degree of membership. MECP2 is connected to three gene clusters mainly covering neurological 
(red) and psychiatric (purple) genes. The seven interconnected disease clusters also represent mainly psychiatric and neurological disorders. Also 
unclassified disorders are present such as e.g. Alcohol dependence (white), which is classified as a mental and behavioral disorder. In a broader 
sense, however, it can be considered as psychiatric disorder. Applying the hard clustering (b), MECP2 is assigned to gene cluster 209 which is 
connected to two disease clusters only. Although all associated disorders are identified correctly, in contrast to the fuzzy clustering no further 
information can be obtained from the decomposition. 
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f = l 



C {i) B {iJ) (C (J) ) T denote the residuals, then 



i<j,k,l 



{d^f . Hence 



we follow Lee and Seung's idea for NMF [13] and rewrite 
this into multiplicative update rules. We therefore choose 
update rates 
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We assume an undirected /c-partite graph, so A ( ' 7) is 
undefined for i >j. For simplicity of notation, we now 
set A ( ^: = (A^) T for i >j (and similarly for the /c-partite 
graph //). Then D {iJ) = (D^V, and the differential sim- 
plifies to 



Plugging this into the gradient descent equations, we 
finally get: 



(( C W)T A «) C (i)) 
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Commonly, in order to extend cost functions in (uni- 
partite) data clustering to include fuzzy clusters, a so- 
called fuzzification factor is introduced [11,38]]. Instead 
Altogether, by replacing the residuals, we have shown of squared norm minimization of the residuals D (lJ \ a 



(C (i) ) T C (i) B (ij) (C (i) ) T C (j) ) r 
3 ^ = -2^(A^C^(B^) T - 



3c 



cWB (fj) (C (j) ) T C (j) (B (ij) ) T ) r 



If we are to minimize /by alternating gradient des- 
cent, we start from an initial guess of B (/;) , C w . Then, 
we alternate between updates of the B (/;) and the C w 

with learning rates r\\ { p and 77 respectively: 

^^"^^ Vi,j:i<i 



These update rules have two disadvantages: first, the 
choice of update rate 77 (possibly different for B, C and /, ;) 
is unclear; in particular, for too small 77 convergence may 
take too long or may not be achieved at all, whereas for 
too large 77 we may easily overshoot the minimum. More- 
over, the resulting matrices may become negative. Hence 



higher residual power is minimized, which results in 
overlapping non-trivial cluster assignments. However, 
we see that in our examples, already the standard case is 
sufficient. This is because we are interested in co-clus- 
tering, which is different from standard data clustering 
where only a unipartite graph and hence C w = C (1) is 
assumed. 

Evaluation on simulated data 

We built a random, modularly structured /c-partite net- 
work as follows: We fix the number of clusters m/ of 
nodes with color /, i - 1, ... k. The backbone graph is 
initialized by x m ; -matrices B ( ^ filled with zeros. We 
added uniformly random ones in each column according 
to a set percentage a (here on average a >1 ones in each 
column) such that each row has at least a single non-zero 
entry. In order to construct the actual network A, we 
split up A (//) into wii • m ; blocks of a fixed chosen cluster- 
size (here 10). We fixed a cluster connectivity /3 and a 
random connectivity y </?. Now, for each non-zero entry 
in B (//) , we set the corresponding block of A (//) to a ran- 
dom Erdos-Renyi graph [39] with density /?. Finally the 
clusters are connected by replacing each zero block of 
A (//) with an Erdos-Renyi graph of the lower connectivity 
7. We analyzed 1000 realizations of four network proto- 
types with increasing complexity (parameters are given in 
Table 1). In order to compare algorithm performance, we 
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Table 1 Random data models for evaluation of the fuzzy 
clustering algorithm 



model k 


m 


a 


P 


r 


description 


1 2 


(3, 3) 


1 


0.7 


0.2 


equal-sized, no overlap 


2 2 


(3, 4) 


1 


0.7 


0.2 


no cluster overlap 


3 3 


(3, 4, 5) 


1.2 


0.6 


0.1 


3-partite, low-noise 


4 3 


(3, 4, 5) 


1.2 


0.8 


0.2 


3-partite, noisy 



Parameters for the simulated data models, k denotes the number of partitions 
of the network, m is a vector with the number of clusters in each partition, a 
the backbone connectivity, f5 the cluster and / the noise connectivity. 



determined algorithm runtime, final cost function value 
and quality of cluster estimation. Cluster estimation qual- 
ity was measured by the summed up Frobenius norms of 
the difference between the true C w and the estimated 

£(0 , where clusters have been permuted such as to give 

minimal difference (permutation indeterminacy). 

Construction of a disease-gene-protein complex graph 

We constructed a layered, tripartite graph by enlarging 
the human disease network [3] by all human protein 
complexes from the CORUM core set (as of July 2009) 
[16]. Integrating both data sets resulted in a graph of 
5672 nodes and 7795 edges with all genetic disorders, 
all known disease genes and human protein complexes. 
We extracted the largest connected component resulting 
in a network with | V\ = 3737 and \E\ = 6219. It consists 
of 854 complexes (V c ), 590 diseases (V d ) and 2293 genes 
(V g ) (see Additional File 7). 

Parameter determination 

We determined parameters for clustering on different 
scales. For large-scale clustering, we approximated the 
number of clusters to be found for each node type by 
limiting the maximal number of gene clusters m g for V g 

to m g = m g = ^\V g \ /2 ^ as suggested in [40]. The 

number of complex clusters m c for V c and disease clus- 
ters m d for V d were then scaled according to m g by t - 

m i = |" m s^/i ^TT/T^l] > where i e {c, d}. To detect 
smaller clusters, we set the maximum number of gene 

clusters to m g for V g according to m g = m g = — j-^- 1 . 

This resulted in a minimum average cluster size of 10 
genes. Parameters m c and for V c and V d were scaled 
as previously. 

Cluster evaluation 

We validated the gene clusters using Gene Ontology 
(GO) enrichment analysis. To this end, the genes used 
in the analysis (degree of membership ft > 0.2) were 



tagged with their respective GO categories and analyzed 
within each cluster for overrepresentation of certain 
categories versus the "background" level of the popula- 
tion (in this case, all genes in the tripartite graph). We 
used Ontologizer [41] with the setting "Parent-Child- 
Intersection" restricting the analysis to the biological 
process category. For multiple testing correction we 
employed Bonferroni correction. To assign GO terms to 
gene sets, a j?-value cutoff of 0.05 was used. 

For evaluating the cluster interconnectivity we 
employed FunCat [42] classifications for all genes and 
protein complexes. We used FunCat, as Gene Ontology 
associations for genes could be mapped to their accord- 
ing FunCat categories, but not vice versa. A subset of 13 
main categories was used, subcategory annotations were 
mapped to corresponding main category terms. Disorder 
classifications for genes and diseases were taken from 
[3], where classification classes grey and multiple were 
combined for pleiotropic genes (see Additional File 8). 
We calculated Pearson's correlation coefficients between 
cluster FunCat/disorder annotations by weighting a clus- 
ter element's classification by its degree of membership 
to the particular cluster. The difference score between 
normalized backbone interconnectivity and annotation 
correlation was determined using the Frobenius norm of 
their difference. 

Null model 

Null models for the evaluation of the backbone graph in 
the large-scale clustering were generated by applying a 
weighted bipartite randomization procedure to each par- 
tition-cluster subgraph C w . To this end, we generalized 
the degree preserving rewiring of complex networks first 
introduced by [43]. In the weighted case, one has to 
decide between preserving either the number of neigh- 
bors of all nodes, or the total weight of their adjacent 
edges. We chose to maintain the first quantity: In every 
randomization step we randomly picked two edges and 
exchange their endpoints of the partition type, thereby 
keeping the weights attached to the edges. With this we 
also conserved the weighted degree of the partition 
nodes which reflects the right-stochasticity of the fuzzy 
clusterings. The degree of randomization can be moni- 
tored by a loss of degree-correlations between first and 
second neighbors. In practice, correlations vanish after 
about one randomization step per edge. So, for our ana- 
lyses we used five times this number as suggested in 
[44]. The ^-values were calculated over 10000 runs. 

Availability and requirements 

Project name: Fuzzy clustering of k-partite graphs. 

Project home page: http://cmb.helmholtz-muenchen. 
de/fuzzyclustering. 

Operating system(s): Platform independent. 
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Programming language: MATLAB/Octave. 
Other requirements: MATLAB 7.1 or higher (no 
additional toolboxes required) or Octave. 
License: Free for non-commercial purposes. 

Additional material 



Additional file 1: MATLAB source code. Fuzzy k-partite graph 
clustering algorithm for MATLAB. 

Additional file 2: Octave source code. Fuzzy k-partite graph clustering 
algorithm for Octave. 

Additional file 3: Simulations on algorithm runtime. Verification of 
the estimation of the algorithm's time complexity by simulations. 

Additional file 4: Simulation on cluster stability. Analysis of the 
algorithm's stability towards the random initialization. 

Additional file 5: The chosen number of clusters. Analysis of the cost 
function as an indicator for determining the number of clusters. We 
study the stability of the clusterings with respect to this choice and give 
evidence that the gene-disease-complex graph is modularly structured. 

Additional file 6: GO enrichment analysis for the gene clusters from 
the large-scale clustering. Tables 1-10 show the GO (Gene Ontology) 
enrichment using Ontologizer [41] for the ten gene clusters in the large- 
scale clustering. We used only genes having a degree of membership u 
> 0.2 (see Methods). 

Additional file 7: Integrated tripartite network. Illustration of the 
largest connected component of the layered, tripartite graph gene- 
disease-protein complex network. It consists of 2293 gene (green), 590 
disease (red) and 854 complex (blue) nodes connected by 6219 edges. 

Additional 8: FunCat and disorder class annotation tables. Table 1 
shows the FunCat classes used for evaluating the gene and protein 
complex clusters. A subset of 13 FunCat main categories was taken from 
CORUM. Table 2 represents the 20 primary disorder classes retrieved 
from Goh et al. (2007). Additional classes are multiple, grey and 
unclassfied. 
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